Optimizing a Fast Stream Cipher for VLIW, SIMD, and Superscalar Processors
نویسنده
چکیده
The mismatch between traditional cipher designs and efficient operation on modern Very Long Instruction Word, Single Instruction Multiple Data, superscalar, and deeply pipelined processors is explored. Guidelines are developed for efficiently exploiting the instructionlevel parallelism of these processor architectures. Two stream ciphers, WAKE-ROFB and WiderWake, incorporating these ideas are proposed. WAKE-ROFB inherits the security characteristics of WAKE, from which it is derived, but runs almost three times as fast as WAKE on a commercially available VLIW CPU. Throughput in excess of 40 MByte/s on a 100 MHz processor is demonstrated. Another derivative, WiderWake, whose security characteristics are not directly transferrable from WAKE runs in excess of 50 MByte/s on the same processor.
منابع مشابه
A Comparison Between Processor Architectures for Multimedia Applications
The efficient processing of MultiMedia Applications (MMAs) is currently one of the main bottlenecks in the media processing field. Many architectures have been proposed for processing MMAs such as VLIW, superscalar (general-purpose processor enhanced with a multimedia extension such as MMX), vector architectures, SIMD architectures, and reconfigurable computing devices. The question then arises...
متن کاملEvaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
This paper aims to provide a quantitative understanding of the performance of DSP and multimedia applications on very long instruction word (VLIW), single instruction multiple data (SIMD), and superscalar processors. We evaluate the performance of the VLIW paradigm using Texas Instruments Inc.’s TMS320C62xx processor and the SIMD paradigm using Intel’s Pentium II processor (with MMX) on a set o...
متن کاملEvaluating Compiler Support for Complexity Effective Network Processing
Statically scheduled processors are known to enable low complexity hardware implementations that lead to reduced design and verification time. However, statically scheduled processors are critically dependent on the compiler to exploit instruction level parallelism and deliver higher performance. In order to ascertain the suitability of statically scheduled processors for network processing (wh...
متن کاملOptimizing Matrix-matrix Multiplication for an Embedded Vliw Processor
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...
متن کاملExtending Static Synchronization Beyond SIMD and VLIW
A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between “processes” is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of wh...
متن کامل